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Statistical analysis and inference on spike trains is one of the cen¬ 
tral topics in the neural coding. It is of great interest to understand 
the underlying structure of given neural data. Based on the metric 
distances between spike trains, recent investigations have introduced 
the notion of an average or prototype spike train to characterize the 
template pattern in the neural activity. However, as those metrics lack 
certain Euclidean properties, the dehned averages are nonunique, and 
do not share the conventional properties of a mean. In this article, 
we propose a new framework to define the mean spike train where 
we adopt a Euclidean-like metric from an family. We demonstrate 
that this new mean spike train properly represents the average pat¬ 
tern in the conventional fashion, and can be effectively computed 
using a theoretically-proven convergent procedure. We compare this 
mean with other spike train averages and demonstrate its superiority. 
Furthermore, we apply the new framework in a recording from rodent 
geniculate ganglion, where background firing activity is a common is¬ 
sue for neural coding. We show that the proposed mean spike train 
can be utilized to remove the background noise and improve decoding 
performance. 


1. Introduction. Neural spike trains are often called the language of the 
brain and are the focus of many investigations in computational neuro¬ 
science. Due to the stochastic nature of the spike discharge record, proba¬ 
bilistic and statistical methods have been extensively investigated to exam¬ 
ine the underlying firing patterns [Rieke et al. (1999), Brown et al. (2002), 
Kass, Ventura and Brown (2005), Box, Hunter and Hunter (1978), Kass 
and Ventura (2001)]. However, these methods focus only on parametric rep¬ 
resentations at each given time and therefore can prove to be limited in 
data-driven problems in the space of spike trains. 
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Alternative approaches for analyzing spike train data are based on metri- 
cizing the spike train space. Over the past two decades, various methods 
have been developed to measure distances or dissimilarities between spike 
trains, for example, the distances in discrete state space, discrete time mod¬ 
els [Lim and Capranica (1994), MacLeod, Backer and Laurent (1998), Rieke 
et al. (1999)], those in discrete state space, continuous time models [Victor 
and Purpura (1996), Aronov et al. (2003), Aronov and Victor (2004), Victor, 
Goldberg and Gardner (2007), Wu and Srivastava (2011)], those in contin¬ 
uous state space, continuous time models [van Rossum (2001), Houghton 
and Sen (2008), Houghton (2009)], and a number of others [Schreiber et al. 
(2003), Kreuz et al. (2007), Quian Quiroga, Kreuz and Grassberger (2002), 
Hunter and Milton (2003), Paiva, Park and Principe (2009)]. 

An ongoing pursuit of great interest in computational neuroscience is 
defining an average that can represent tendency of a set of spike trains. 
What follows is the problem of defining basic summary statistics reflecting 
the intuitive properties of the mean and the variance, which are crucial for 
further statistical inference methods. In particular, to make the first-order 
statistic, mean, convenient for constructing new framework and inference 
methods, it should satisfy the following properties: 

1. The mean is uniquely defined in a certain framework. 

2. The mean is still a spike train. 

3. The mean represents the conventional intuition of average like in the 
Euclidean space. 

4. The mean depends on exact spike times only, and is independent of 
the recording time period. 

5. The mean can be computed efficiently. 

Property 3 can be described as follows: given a set of N spike trains with each 
having K spikes, we denote these trains using vectors {{xn,... ,XiK)}^i- 
Then the mean spike train is expected to resemble ^ 

In Victor and Purpura (1997), the authors considered a “consensus” spike 
train, which is the centroid of a spike train set (under the Victor-Purpura 
metric). This idea was further generalized in Diez, Schoenberg and Woody 
(2012) to a “prototype” spike train which does not have to belong to the 
given set of spike trains, but its spike times are chosen from the set of 
all recorded spike times. Recently, a notion of an “average” based on ker¬ 
nel smoothing methods was introduced in Julienne and Houghton (2013). 
In Wu and Srivastava (2011, 2013), the authors proposed an elastic metric 
on inter-spike intervals to define a mean directly in the spike train space. 
However, none of these approaches satisfies the 5 desirable properties listed 
above, and therefore may result in limited use in practical applications. 

In this article, we propose a new framework for defining the mean spike 
train. We adopt a recently developed metric related to an 1/ family with 
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p > 1, which inherits desirable properties in the special case of p = 2 [Dubbs, 
Seiler and Magnasco (2010)]. This metric is a direct generalization of the 
Victor-Purpura metric, and we refer to it as a GVP (Generalized Victor- 
Purpura) metric. We will demonstrate that this new mean spike train satis¬ 
fies all aforementioned 5 properties. In particular, the new framework is the 
only one (over all available methods) that has desirable Euclidean proper¬ 
ties on the given spike times. Such properties are crucial for the dehnition of 
summary statistics such as the mean, variance, and covariance in the spike 
train space. In general, these 5 properties assure intuitiveness of the sum¬ 
mary statistics, as well as efficiency in their estimation. In contrast, previous 
methods have issues such as nonuniqueness, dependence on model assump¬ 
tions, or more complicated computations, and therefore do not result in the 
same level of performance (see the detailed comparison in Section 2.5). 

One direct application of the mean spike train is in neural decoding in 
the rodent peripheral gustatory system [Wu et al. (2013)]. The neural data 
was recorded from single cells in geniculate ganglion, as the spiking activity 
in these neurons modulated with respect to different taste stimuli on the 
tongue. It is commonly known that spontaneous spiking activity can be 
observed even if only artificial saliva is applied. Thus, the neural response 
is actually a mixture of a background activity and a taste-stimulus activity. 
In this article we demonstrate using simulation as well as real data that the 
proposed new framework can be used to remove the background activity, 
which leads to improvement in decoding performance. 

In Section 2 we define the new framework by introducing the mean spike 
train under the GVP metric, and provide an efficient algorithm to estimate 
it. In Section 3 we extend this framework by developing a statistical ap¬ 
proach for noise removal and apply the method to the experimental data. 
We then discuss the merits of the new framework in Section 4. Finally, in 
the Appendix, we provide mathematical details on the convergence of the 
mean estimation algorithm. 

2. New framework. Before we turn to describing the methods, it is nec¬ 
essary to set up a formal notation in the spike train space. 

2.1. Notation. Assume 5 is a spike train with spike times 0 < si < S 2 < 

• • • < sm < T, where [0,T] denotes a recording time domain. We denote this 
spike train as 



We dehne the space of all spike trains containing M spikes to be Sm and 
the space of all spike trains to be *S = Um=o‘5m. This can be equivalently 
described as a space of all bounded, hnite, increasing sequences. 
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A time warping on the spike times (or inter-spike intervals) has been 
commonly used to measure distance between two spike trains [Victor and 
Purpura (1996), Dubbs, Seiler and Magnasco (2010), Wu and Srivastava 
(2011)]. Let r be the set of all time-warping functions, where a time warping 
is defined as an orientation-preserving diffeomorphism of the domain [ 0 ,r]. 
That; that is, 

r = {7 : [0, T] [0, T] | 7 ( 0 ) = 0, 7 (r) = T, 0 < jit) < 00 }. 

It is easy to verify that P is a group with the operation being the composition 
of functions. By applying 7 G P on a spike train S = one obtains a 

warped spike train 7 ( 5 ) = ( 7 ( 5 ^))^;^. 

2.2. GVP metric. In Dubbs, Seiler and Magnasco (2010), a new spike 
train metric was introduced with parameter pG [l,oo). This metric is a 
direct generalization of the classical Victor-Purpura (VP) metric (VP is 
a special case when p=l), and we refer to it as the Generalized Victor- 
Purpura (GVP) metric. In particular, when p = 2, this metric resembles a 
Euclidean distance. 

Assume that X = {xi)f£.^ and Y = are two spike trains in [0,r]. 

For A(> 0), the GVP metric between X and Y is given in the following form: 

/ \ 1/2 

(l)dGVp[A](V,y)=mm(EoR(V, 7 (y)) + A 2 ^ {xi-yjf] , 

where Eor{-, •) denotes the cardinality of the Exclusive OR (i.e., union minus 
intersection) of two sets. That is, £'or(V, 7 (y)) measures the number of 
unmatched spike times between X and 7 (y) and can be computed as 

M N 

Eor{X,^{Y)) = M + N-2 EE 

i=l j=l 

where Ip} is an indicator function. The constant A(> 0) is the penalty co¬ 
efficient. We emphasize that dcvp is a proper metric, that is, it satisfies 
positive definiteness, symmetry, and the triangle inequality. It shares a lot 
of similarities with the classical Lfi norm. 

Similarly to the result in Wu and Srivastava (2013), one can show that 
the optimal time warping between two spike trains X = {xi)fLi and Y = 
{yj)^=i must be a strictly increasing, piece-wise linear function, with nodes 
mapping from points in Y to points in X. Based on this fact, a dynamic 
programming algorithm was developed to compute the distance dcvp with 
the computational cost of the order of 0{MN). Using the bipartite graph 
matching theory, another efficient algorithm was also developed to compute 
dcvp hr the cost of 0{MN) [Dubbs, Seiler and Magnasco (2010)]. 
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2.3. Definition of the summary statistics and their properties. Conven¬ 
tional statistical inferences in the Euclidean space are based on basic quan¬ 
tities such as mean and variance. For statistical inferences in the spike train 
space, we can analogously use a Euclidean spike train metric to define these 
summary statistics as follows. 

For a set of spike trains 81 , 82 , ■■■, 8 k C S where the corresponding num¬ 
bers of spikes are ni,n 2 ,...,nx (arbitrary nonnegative integers), respec¬ 
tively, we define their sample mean using the classical Karcher mean [Karcher 
(1977)] as follows: 

K 

(2) 5* = argmin VdGVp[A](<S'fc,5')^. 

When the mean spike train 8 * is known, the associated (scalar) sample 
variance, ci^, can be dehned in the following form: 

1 ^ 

(3) ^ K -1 

^ k=l 

The computation of this variance is straightforward, and the main challenge 
is in computing the mean spike train for any A(> 0). 

Before we move on to the computational methods for the summary statis¬ 
tics, we list several basic theoretical properties of the mean spike trains using 
the dcvp nietric. The proofs are omitted here to save space: 

1. The optimal time warping between two spike trains must be a con¬ 
tinuous, increasing, and piece-wise linear function between subsets of spike 
times in these two trains. 

2. Let spike trains X = = {yi)f£i € Sm be defined on [0,r]. If 

A2 <l/(Mr2), then 

/ M 

dGYp[X]{X,Y)=ix^Y.(xi-yif 

\ i=i 

3. Assume a set of spike trains 81 , 82 , ■ ■ ■, 8 k with ni,n 2 , ■ ■ ■ ,nK 
spikes, respectively, and let A^max = max(ni, 712 ,..., n^-). If A^ < 1/ 

then the number of spikes in the mean train is the median 

of {nk}k=v 

4. Let spike trains 81 ,..., 8 k G Sm- If A^ < l/(ArMT^), then the mean 
spike train has a conventional closed-form: 
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2.4. Computation of the mean spike train. To compute the mean spike 
train S* under the GVP metric, we need to estimate two unknowns: (1) 
the number of spikes n, and (2) the placements of these spikes in [0,T]. For 
a general value of A > 0, neither the matching term nor the penalty term 
is dominant, and therefore we cannot identify the number of spikes in the 
mean before estimating their placements [Wu and Srivastava (2011)]. A key 
challenge is that we need to update the number of spikes in the algorithm. 
In this article, we propose a general algorithm to estimate the mean spike 
train. We initialize the number of spikes in the mean spike train equal to 
the maximum of {ni,n 2 ,... and then adjust this number during the 

iterations. We present, here, how the Karcher mean in equation (2) can be 
efficiently computed using a convergent procedure. 

2.4.1. Algorithm. Assume that we have a set of K spike trains. Si ,..., Sk 
with ni, n 2 ,..., uk spikes, respectively. Denote Sk = and S = . 

Then the sum of squared distances in equation (2) is 


K 

y~]dGVp[A](5^,5')^ 



We develop here an iterative procedure to minimize dGVp[A](5'^, 5)^ 

(as a function of S) and estimate the optimal S*. This new algorithm has four 
main steps in each iteration: Matching, Adjusting, Pruning, and Checking, 
and we refer to it as the MAPC algorithm. In particular, the Adjusting 
step corresponds to the Centering step in the MCP-algorithm in Wu and 
Srivastava (2013); in contrast to the nonlinear warping-based Centering- 
step, the Adjusting step utilizes the Euclidean property and updates the 
mean spike train in an efficient linear fashion. The Checking step is mainly 
used to avoid underestimating the number of spikes in the mean. This step 
adds one spike into the current mean and checks if such an addition results 
in a better mean (i.e., smaller mean squared distances). In contrast, such a 
problem is not addressed in the MCP algorithm. 

Matching-Adjusting-Pruning-Checking (MAPC) Algorithm: 

1. Let n = maxjni, n 2 ,..., uk}- (Randomly) set initial times for the n spikes 
in [0,T] to form an initial S. 

2. Matching step: Use the dynamic programming procedure [Wu and Srivas¬ 
tava (2011)] to find the optimal matching 7 ^ from S to S^,k = 1 ,... ,K. 
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That is, 


(5) 7^ = argmiiy^oR[*S'^,7(5')]+A^ ^ {4 - 

{Lhs?=7(sj)} 

3. Adjusting step: 

(a) For A; = 1,..., iC, j = 1,..., re, define 

s^, if die refc}, s.t. 7 ^(sj) = s,^, 

Sj, otherwise. 

(b) Denote = {r \,..., 4),k = 1,..., iC. Then we update the mean 
spike train to be 5 = -^ 

4. Pruning step: Remove spikes from the proposed mean S that are matched 
less than Rr/2 times. 

(a) For each j = l,...,re, count the number of times sj appears in 

That is, /ij = Ef=i 

(b) Remove Sj from S if hj < Rr/2,j = l,...,re, and denote the up¬ 
dated mean spike train as S*. Then 

S* = {sjeS\hj>K/2}. 



5. Cheeking step: To avoid being stuck in a local minimum, we check if 
an insertion or/and deletion of a specific spike can improve the mean 
estimation: 

(a) Let S* be S* except one spike with the minimal number of ap¬ 
pearances (randomly chosen if there are multiple spikes at the minimum) 
in the Pruning step. Then, update the mean as 

' K K 

I 5*, if J]dGVp[A](S^5*)'< J]dGVp[A](5^5*)^ 

k=l k=l 

, S*, otherwise. 

(b) Let S** be the current mean S** with one spike inserted at random 
within [0,T]. Then update the mean as 

' K K 

I 5", if J^dGVp[A](5^5**)2< J^dGVp[A](5^5**)^ 

k=l k=l 

, S**, otherwise. 

6 . Mean update: Let S = S*** and re be the number of spikes in S. 
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7. If the sum of squared distances stabilizes over steps 2-6, then the mean 
spike train is the current estimate and we can stop the procedure. Oth¬ 
erwise, go back to step 2. 

Denote the estimated mean in the mth iteration as One can show 

that the sum of squared distances (SSD), '^gvp de¬ 

creases iteratively as a function of m. As 0 is a natural lower bound, the SSD 
will always converge when m gets large. The detailed proof is given in the 
Appendix. Note that this MAPC algorithm takes only linear computational 
order on the number of spike trains in the set. In practical applications, 
we find that this algorithm has great efficiency in reaching a reasonable 
convergence to a mean spike train. 

In general, when the penalty coefficient A gets large, the optimal time 
warping chooses to have fewer matchings between spikes to lower the warping 
cost. Some of the spikes in the mean will be removed during the iterations to 
minimize the SSD. In the extreme case, when A is sufficiently large, any time 
warping would be discouraged (as that will result in a larger distance than 
simply the Exclusive OR operation). In this case, the mean spike train will 
be an empty train. This result indicates that in order to get a meaningful 
estimate of the mean spike train, the penalty coefficient A should not take a 
very large value. In practical use, one may use a cross-validation to decide 
the optimal value of A. 

2.4.2. Illustration of the MAPC algorithm. To test the performance of 
the MAPC algorithm, we illustrate the mean estimation using 30 spike trains 
randomly generated from a homogeneous Poisson point process. Let the total 
time T = 1 (sec), the Poisson rate p = 8 (spikes/sec). The individual spike 
trains are shown in Figure lA. The number of spikes in these trains varies 


A 


B 


C 



Fig. 1. A; 30 spike trains generated from a homogeneous Poisson process. Each vertical 
line indicates a spike. B; Estimation results when = 6. Upper panel: The sum of squared 
distances (SSD) over all iterations. Lower panel: The estimated mean spike train over all 
iterations. The initial is the spike train on the top row (0), and the final estimate is the 
spike train on the bottom row (12th). C/ Estimation result when X^ = 60. 
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from 3 to 13, with the median value of 9. Therefore, n, the number of spikes 
in the mean, is initialized to be 13 and we adopt randomly distributed 13 
spikes in [0, T] as the initial for the mean in each case. We let vary between 
6 and 60 to show the behavior for a small and a large warping penalty. 

The result for the MAPC algorithm for = 6 is shown in Figure IB. 
The upper panel shows the evolution of the SSD in equation (2) versus the 
iteration index. We see that it takes only a few iterations for the SSD to 
decreasingly converge to a minimum. The estimated mean spike trains over 
iteration are shown in the lower panel. Apparent changes are observed in the 
first few (1 to 5) iterations, and then the process stabilizes. Note that the 
spikes in the mean train are approximately evenly spaced, which properly 
captures the homogeneous nature of the underlying process. We also note 
that the number of spikes in this mean spike train is 9, which equals the 
median of the numbers of spikes in the set. 

The result for A^ = 60 is shown in Figure 1C. With a larger penalty, the 
optimal time warping between spike trains chooses to have fewer matchings 
between spikes to lower the warping cost. Some of the spikes in the mean 
are removed during the iteration. In this case, the convergent SSD is about 
150, which is greater than the SSD when A^ = 6 (at about 80). Note that 
when A is even larger, we expect fewer or even no spikes to appear in the 
estimated mean. 

2.5. Advantages over previous methods of averaging spike trains. There 
have been multiple ideas of capturing the general trend in a set of spike 
trains, which include the “consensus” spike train [Victor and Purpura (1997)], 
the “prototype” spike train [Diez, Schoenberg and Woody (2012)], the “aver¬ 
age” spike train [Julienne and Houghton (2013)], and the “mean” spike train 
[Wu and Srivastava (2011)]. However, none of those concepts satisfies all de¬ 
sirable 5 properties of a mean spike train listed in the Introduction section. 
We have summarized the most relevant differentiating features in Table 1. In 
the case of the “consensus” and “prototype” spike trains, one main problem 
lies in the nonuniqueness of the results in the spike train space, which arises 
directly from the underlying metric used (resembles Manhattan distance). If 
the estimated spiking times of those averages are restricted to spiking times 
in the sample sets, then the estimates can be unreliable, particularly when 
sample sizes are relatively small. The “average” design uses the van Rossum 
metric, which relies on kernel-smoothing of the spike trains. The estimation 
of the “average” is based on a greedy algorithm, but the accuracy and the 
kernel dependence of the method have not been carefully examined. In this 
article, we propose a new notion of a “mean” spike train based on the kernel- 
free GVP metric. The key advantages behind our design are the Euclidean 
properties of GVP distance and the subsequent Karcher mean definition [in 
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Table 1 

Comparison of average of spike trains for different methods 


Method 

“mean” 

(proposed 

framework) 

“consensus” 

Victor and 
Purpura 
(1997) 

“prototype” 
Diez, Schoen¬ 
berg and 
Woody (2012) 

“average” 
Julienne and 
Houghton 
(2013) 

“mean” 

Wu and 
Srivastava 
(2011) 

Metric used 

GVP metric 

Victor- Victor- 

Purpura metricPurpura metric 

van Rossum 

metric 

Elastic 

metric 

Properties (in 
Introduction) 
satisfied 

1, 2, 3, 4, 5 

2, 4, 5 

2, 4, 5 

1, 2,4 

1, 2, 5 

Domain 

Full spike 

Given 

Given 

Full spike 

Full spike 


train space 

sample 

set 

spike times 
set 

train space 

train space 

Number of spikes, 
A2<1 

Median of 
{m,... ,nAr} 

NA 

NA 

NA 

median of 
{m,.. .,nN] 

Spike times if 
ni = • • • = njv, 
A2<1 

1 

Cj — Iv Z^fe = l 

Restricted 
to sample 
set 

Restricted 
to sampled 
spike times 

NA 

ISTbased 

nonlinear 

form 

Uniqueness in the 
full space 

Almost surely 

Nonunique 

Nonunique 

Not known 

Almost 

surely 


equation (2)]. The new framework satisfies all 5 desirable properties, which 
distinguishes it from others. 

It is worth noting that, to the best of our knowledge, the GVP metric is 
one of only two spike train metrics that have Euclidean properties [the other 
one is the Elastic metric proposed in Wu and Srivastava (2011)]. However, 
the Elastic metric satisfies only 3 out of the 5 properties, and the GVP metric 
has two apparent advantages over it: first, the Elastic metric estimates the 
mean spike train explicitly depending on recording intervals. Such depen¬ 
dence may introduce an additional level of noise arising from experimental 
parameters, making the inference less reliable. In contrast, the mean spike 
train under the GVP metric relies only on exact spike times in the given data 
and is independent of recording intervals (Property 4). Second, the fact that 
the Elastic mean is estimated through the inter-spike intervals (ISI) makes it 
difficult to capture the intuition behind the result, whereas the GVP mean 
is estimated directly through spike times and resembles the intuition of the 
mean estimation (Property 3). 

Eor illustrative purposes, we compare spike train “averages” of all methods 
using the 30 spike trains in Section 2.4.2, where the data is simulated under 
a homogeneous Poisson process. A natural expectation is that these averages 
should be equi-distantly spaced across the time domain. We adopt a simple 
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GVP Mean {SD,si = 0.019) 


0 1 
time (sec) 

“Prototype” {SDisi = 0.037) 


0 1 
time (sec) 


Elastic Mean {SDisi = 0.029) 


0 1 

time (sec) 

“Consensus” {SD,si = 0.124) 

1 1 I_LLLTT 

0 1 

time (sec) 


Fig. 2. Averaged spike trains according to four different methods. 


measure for the equi-distant spacing—compute the standard deviation of the 
ISI in each train, denoted by SDisi- Basically, smaller SDisi values indicate 
more even spacing. In Figure 2, we show the averages estimated using the 
GVP mean, Elastic mean, “Prototype,” and “Consensus” methods. (In the 
GVP and Elastic methods, we let penalty coefficients be sufficiently small.) 
It is found that SDisi in the GVP mean is 0.019, the smallest over all four 
methods. 

3. Application in noise removal. The notion of the mean spike train has 
a direct application in neural decoding. In this article we examine how the 
mean can be used to remove spontaneous activity in geniculate ganglion 
neurons and improve decoding performance. 

3.1. Noise-removal method. Geniculate ganglion neurons exhibit spiking 
response to the chemical stimulus applied on the taste buds on the tongue. 
Such neuronal activity is commonly used in the neural coding in the periph¬ 
eral gustatory system [Di Lorenzo, Chen and Victor (2009), Breza, Nikonov 
and Contreras (2010), Lawhern et al. (2011)]. We note that these neurons 
exhibit responses even if there is no stimulus applied or the stimulation is 
a control solution—artificial saliva. That is, the observed spike trains under 
the stimulation are likely to be a mixture of the spontaneous activity and 
responses to the taste stimuli. In the context of neural decoding, such spon¬ 
taneous activity can be viewed as “background noise,” and a “de-noised” 
spiking activity is expected to better characterize the neural response with 
respect to the taste stimulus and result in better decoding performance. 

Previous approaches to the noise-removal focus mainly on spike count 
across a time domain and do not have a temporal matching between spikes. 
In this paper, we propose a novel noise-removal procedure based on our new 
framework. In Figure 3 we describe the schematic idea of incorporating the 
noise removal in statistical inference. The procedure assumes that the ob¬ 
served data is a “sum” of isolated neuron responses and their spontaneous 
activity. To improve the neural decoding, we at first use the stimulus-free 
spike recordings to estimate the mean background noise with the MAPC 
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Fig. 3. Scheme differentiating the noise-removal approach from standard inference on 
spike train data. Dashed boxes indicate the components of standard inference framework, 
the solid lines indicate where the noise-removal framework is introduced. 

algorithm. Then, we “subtract” the mean out from the observed stimulus- 
dependent data. Obviously, for random variables X, Y in vector spaces one 
cannot assume that X = X -\-Y — Y (Y denotes the mean of T) is a noise re¬ 
duced “version” of X -\-Y. However, in the space of spike trains we managed 
to establish this procedure, utilizing the warping matchings on the GVP 
mean. The procedure of obtaining X = X ®Y QY indeed gives a noise- 
reduced version X of a point pattern X ®Y. This approach is possible with 
definitions of the addition © and the subtraction 0 in the spike train space 
as follows: 

Adding spike trains. We assume that the noise is additive and adding 
spike trains is achieved by union set operation. That is, let X = (xi,..., xat) 
and Y = {yi,... ^yu) be two spike trains of length N and M, respectively. 
We define a spike train Z = X (BY as a spike train of length N -\- M such 
that 

Z = X(B>Y = ({xi,...,XAr}U{yi,...,yM})- 

Subtracting spike trains. Defining the subtraction is more challenging, as 
it cannot follow directly from the set operations. This is due to the fact that 
it is unlikely to have coinciding spike times in two different spike trains. 
To perform the subtraction, we turn to the definition of the GVP metric 
and optimal warping between two spike trains [equation (1)]. We define the 
subtraction of a spike train Y from a spike train X as removing all spike 
positions from X that are matched with spikes in Y under the optimal 
warping 7 . We say that a spike pair (xi,yj) is matched if Xi = ^{yj) for 
some pair 

Formally, let X = (xi,..., xn),Y = (yi,..., yjvf) be two spike trains and 
7 be the optimal warping between them according to the dcvp nietric. We 
define the subtraction of Y from X, noted by Z = X QY, as follows: 

Z = X QY = ({xi,.. .,XAr} \ {xi : Xi = 'y{yj) for some pair {i,j)}). 

Once the © and © are established, we can describe the noise-removal 
method as follows: we “subtract” the mean(y) from the observed X QY 
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using the matching of the GVP metric and obtain a spike train X © y 0 
mean(y) by removing matched spikes. We at first use a simulation for illus¬ 
tration of both © and © operations in Section 3.2. The new method is then 
applied to a real experimental data set in Section 3.3. 


3.2. Result for simulated data. To illustrate the noise-removal frame¬ 
work, we first generate 40 independent realizations of a homogeneous 

Poisson process on [0,2] with constant intensity a = 10. These simulations 
represent the noise and are used to estimate the mean background noise fi 
with the MAPC algorithm. The results are shown in Figure 4A. 

Next we generate two sets of 20 independent spike trains, {Xi}j^^ and 
as realizations of an inhomogeneous Poisson processes (IPP) with 
intensity functions px{t) = exp(—(t — 1.5)^) and /9y(t) = exp(—(t — 0.5)^), 
respectively. The generated spike trains are shown in Figure 4B. In our 
framework they correspond to the underlying true neuronal signals. 

In the third step we obtain the equivalent of the “observed” data, by 
adding the previously generated noise for each generated pi to the corre¬ 
sponding spike trains Xi and Yi. The combined results are shown in Fig¬ 
ure 4C. In this case adding spike trains is understood in the set opera¬ 
tions terms. We obtain spike trains following Poisson processes Xi © pi ~ 
IPP(/OA + (y),Yi® p.i+20 ~ IPP(py + a). 

The mean background noise spike train fi is then subtracted out from each 
realization of the noised data set, according to the procedure described in 
Section 3.1. For each z = 1,..., 20, we obtain the noise removed spike trains: 
Xi® Pi Q jl,Yi® pi +20 © Aj shown in Figure 4D. 



Fig. 4. Illustration of the noise addition and the noise removal with the use of the ©,0 
operations. A; Background noise—fO spike trains generated from HPP(IO); the mean back¬ 
ground noise is presented with dashed lines in the bottom row. B; 2 x 20 spike trains from 
IPP(px) (asterisks) and IPP(py) (circles), respectively. C: Sum of spike trains from A 
and B. D.- Spike trains after the background noise removed. 
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Fig. 5. The noise-removal influence on classification performance with respect to in¬ 
creasing noise level a. A: The classification performance for the noisy data. The bold 
lines represent the average classification score among 50 simulations, the dotted lines in¬ 
dicate the standard deviation from the average classification score. B Same as A, but for 
the noise-removed data. C; Mean classification score curves from A (dashed line) and B 
(dotted line). 


We repeat this simulation procedure 50 times for each level of a G [2,20] 
and perform classification on the noisy data Xi (B Q) fii +20 as well as 
on the noise-removed: Xi(B fiiQ ft, Yi 0 /ii+ 2 o © A- The classification score is 
obtained by a standard leave-one-out cross-validation. We record the aver¬ 
age score (the classification accuracy) with the standard deviation for each 
a level; the result is shown in Figure 5A, B. As anticipated with the increas¬ 
ing noise intensity a, the classification performance on each of the noisy and 
noise-reduced data sets declines. However, if we compare the two average 
classification scores presented in Figure 5C, we see that the noise-removal 
framework outperforms the classification on the noisy data once the noise 
intensity level a becomes not negligible. This result indicates that the pro¬ 
posed noise-removal procedure can help increase the contrast between dif¬ 
ferent classes and result in an improvement in classification analysis. Next, 
we will examine this procedure on a real experimental data set. 

3.3. Result in real data in gustatory system. Here we apply the noise- 
removal procedure to neural response in the gustatory system and test if 
the decoding (i.e., classification with respect to taste stimuli) can improve 
after the spontaneous activity is removed. The data consists of spike train 
recordings of rat geniculate ganglion neurons and was previously used in Wu 
et al. (2013). Briefly, adult male Sprague-Dawley rat’s geniculate ganglion 
tongue neurons were stimulated with 10 different solutions over a time period 
of 5 seconds: 0.1 M NaCl, 0.01 M citric acid (CA), 0.003, 0.01, 0.03, and 0.1 
M acetic acid (AA), and each AA mixed with 0.1 M NaCl. Each stimulus was 
presented 2-4 times. Stimulus trials were divided into three time regions: a 
5-second pre-stimulus period, a 5-second stimulus application period, and a 
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Fig. 6. An example of spike trains from Cell 10. Each group of 3 or 4 rows corresponds 
to a different type of stimuli applied. A; The 5-second pre-stimulus spike trains, whose 
mean spike train, calculated by the MAPC algorithm, is shown by the thick vertical bars 
on the top of the panel. B; The 5-second stimulus period. C; The same 5-second period of 
spike trains as in B, but with spontaneous activity subtracted out. 

5-second post-stimulus period. During the first and third regions, a control 
solution of artificial saliva was applied. During the stimulus period, one 
of the 10 aforementioned solutions was applied. In this study we focus on 
classifying the given spike trains according to the 10 stimuli presented in each 
of 21 observed neurons. In Figure 6A, B we present the real data recordings 
in the Hrst and second time regions from one example neuron. 

The spike trains in the pre-stimulus 5-second period reflect spontaneous 
activity with artihcial saliva applied. They are treated as “noise” data, in 
contrast to stimulus-dependent responses. We compute their mean spike 
train with the parameter = 0.001 (a small value to get more spikes in 
the mean). The result is shown in the top row in Figure 6A. This mean 
properly summarizes spiking activity during the pre-stimulus period. In the 
next step, we subtract out this mean noise from the data during the stimulus 
period (spike trains between the 5th and 10th second). The noise-removed 
spike trains are shown in Figure 6C. 

We can now compare the decoding performance using the observed 
stimulus-response data and the “noise-removed” data. To reliably evaluate 
classification scores, we take the approach of leave-one-out cross-validation. 
In both cases of the observed data and the noise-removed data, the class 
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Fig. 7. The result of the noise-removal procedure applied to each of the recorded 21 
cells. The marker coding is the same for both panels and indicates the influence of the 
noise-removal approach on the classification score: black circles — increase, grey diamond- 
s — decrease, black asterisks — unchanged. A. Raw classification scores for each cell in each 
condition. B. Same result as in A, but in terms of classification score increase with respect 
to mean noise size. The vertical black line corresponds to the noise size cutoff of 10 spikes. 


is assigned according to the nearest neighbor’s class under the dcvp rnet- 
ric. In this classification analysis, we use = 225, a relatively larger value 
to emphasize the importance of both matching term and penalty term in 
equation (1). 

The comparison on the classification accuracy is shown in Figure 7A. In 
10 out of 21 cells the classification was improved after the noise-removal 
procedure and in only 4 cells the classihcation was hindered. Classification 
in 7 cells remained unchanged after noise removal, which seems to be quite 
significant. To explain this issue, we have investigated the size of the mean 
background noise and its influence on increase in classification performance. 
It turned out, as seen in Figure 7B, that in 5 out of these 7 unchanged cells, 
the pre-stimulus spiking is negligible (the estimated mean spike train has 0 
or 1 spike). In those cases, obviously, subtracting out the mean noise spike 
train will bare minimum influence. 

The remaining two cases are associated with the opposite problem of the 
noise size—the number of spikes in the pre-stimulus period is comparable 
or greater than the number of spikes in the stimulation period. When such 
mean noise is subtracted out, it also can take away relevant information, thus 
it may not improve the decoding. Those noise size issues are consistent with 
common intuition behind the noise removal. However, it is worth noting that 
the size of the estimated mean background noise can be controlled in our 
new framework, by adjusting the penalty parameter A (Section 2.4). More 
investigation will be conducted on the selection of A in our future work. 
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When focusing on cells that have significant noise influence in their spik¬ 
ing pattern (at least 10 spikes in the estimated mean noise spike train), 
we see that in the majority of cases (8 out of 13) the noise removal has 
improved the classification score (Figure 7B). In extreme cases we obtain 
up to 20% improvement in the decoding performance. (Note that with 10 
different stimuli, a random guess results in 10% average accuracy.) Only 3 
out of the 13 cells indicate loss of information and 2 are not influenced by 
noise removal. 

In summary, we find that our notion of mean background noise for spike 
train data is in agreement with the common understanding of the additive 
noise for the majority of recorded neurons. Moreover, the proposed noise- 
removal framework effectively improves neural decoding, provided that the 
pre-stimulus spiking has a high enough intensity. 

4. Discussion. In this article we propose a new framework for defining 
the mean of a set of spike trains and the deviation from the mean. We pro¬ 
vide an efficient algorithm for computing the mean spike train and prove 
the convergence of the method. The framework is based on the dcvp metric 
[Dubbs, Seiler and Magnasco (2010)] which resembles the Euclidean dis¬ 
tance. This concept gives an intuitive sample mean point pattern, in which 
the spike positions in the mean are averaged among matched spikes in a set 
of all spike trains. 

Our summary statistics provide the basis for inference on point pattern 
data and in this article we utilize it to develop a mean-based noise-removal 
approach. We show that our procedure improves the classihcation score for 
simulated inhomogeneous Poisson point process data with various nonneg- 
ligible noise levels. We have also applied the new tools to a neural decoding 
problem in a rat’s gustatory system. It is found that the mean point pattern 
approach and the noise-removal framework significantly improved the neural 
decoding among the set of 21 neurons. 

In the noise-removal framework, we have defined the operations of ad¬ 
dition and subtraction between spike trains with the use of the matching 
component of the GVP metric. We, however, note that those operations do 
not satisfy the law of associativity. For more advanced analysis, it is desir¬ 
able to establish an algebraic structure on the space of point patterns. Thus, 
refining those approaches will be pursued in the further work. 

Once the algebraic structure is established, statistical models can be built 
and regression analysis can be performed. With this setting and the already 
developed mean and the deviation from the mean approaches, we expect to 
develop classical statistical inferences such as hypothesis tests, confidence 
intervals/regions, FANOVA (functional ANOVA), FPCA (functional PCA), 
and regressions on functional data [Ramsay and Silverman (2005), Valder- 
rama (2007)]. All these tools are expected to provide a new methodology 
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for more effective analysis and modeling of neural spike trains or any point 
pattern data in general. 


APPENDIX 

Theorem 1 (Convergence of the MAPC algorithm). Denote the esti¬ 
mated mean in the mth iteration of the MAPC-algorithm as Then the 

sum of squared distances dGVp('S '^)decreases iteratively. That 

is, 

K K 

k=l k=l 


Proof. The proof will go through steps 2-6 of the algorithm—in each 
step we will show that the overall distance to the proposed mean = 

, (m) {'m)\ ■ 

(s]^ ,...,Sn ) IS nonmcreasmg: 

1. Matching: In the mth iteration, we find the optimal matching 7 ^ from 
to for each /cG AT}. Having those, we can write 

K 

Y,dGWp{S\S^^^f 

k=l 

= Eok{S\iHS^Y) + a' E E 


2. Adjusting: By definition, we update to 5*^™^ = ..., = 

^ Ef=i Rk, where Rk = {r'l,..., r^) with 

s\, if {!,..., nfe}, s.t. j^{sY) = s^, 
otherwise 

k = l,...,K,j = l,...,n. 


r^ = 
j 




Hence, Ef.i = ELi > ELl 

E7i(eJ-5fd. 

Let 7 be the piecewise linear warping function from to that 

is, = 7 ( 5 '(™)). Then 


K 

YdGYPiS^S^Y^ 

k=l 


EUCLIDEAN SUMMARY STATISTICS IN THE NEURAL SPIKE TRAIN SPACE19 


K 


K n 


> Eor(S‘, 7 ‘(S‘’”’)) + EE 




k=l 

K 


k=lj=l 

K 


>Y,Eor{S^7HS^^^)) + X^Y1 E ( 


k -(" 1)72 


Si — S 


k=l 

K 




K 


Y,Eor{S\ 7^ + X^Yl 


k=l 

K 


E 




>E‘'gvp(S‘A‘’”’)' 


k=l 
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5*'™'^ in which all spikes appear in { 7 ^( 5 '^)}^]^ at least iC/2 times. Based on 
the result in the Adjusting step, we have '^k=i^GY'p{S^■,3^'^^)'^ > 

Ef=i£^OR(S‘.7'= 0 7-‘(S<"’>)) + - ‘P' f- 

Using the basic rule of the Exclusive OR, it is easy to find that 
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Let ..., where n* denotes the number of spikes in 
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4. Checking: Finally, we perform the checking step to avoid the possible 
local minima in the pruning process. In the test if a spike can be removed 
from we let be except one spike with a minimal number 
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of appearances. Then update the mean spike train as 




< 




d*(m) 

\ ? 


K K 

k=l k=l 

otherwise. 


It is easy to verify that ^gvp('S'^, > Ylk=i^GVp{S’^, . In 

the test if a spike can be added to S**^^\ we let be 5**("*) with one 

spike inserted at random within [0,T]. Then update the mean spike train as 




< 






K K 

if J]; dGVP {s ^, < Y, dcYP {s’^, f, 

k=l k=l 

otherwise. 


It is easy to see that dGVp(<S'^) dGVp(5'^, 5***^™'))^. Us¬ 
ing step 6, the mean at the (m -|- l)th iteration is Hence, 
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